Author Profiling for Arabic Tweets based on n-grams
نویسندگان
چکیده
This paper presents an approach for author profiling of an unknown users from their texts produced in social media. In particular, we address the identification of two profile dimensions: gender and language variety, of Arabic twitter users based on their tweets. Our approach focused on applying metaclassification technique on features extracted from tweets body. We explored two main sets of features which are character and word n-grams. The proposed approach allowed us to reach promising results for both language variety and gender identification
منابع مشابه
Using TF-IDF n-gram and Word Embedding Cluster Ensembles for Author Profiling
This paper presents our approach and results for the 2017 PAN Author Profiling Shared Task. Language-specific corpora were provided for four langauges: Spanish, English, Portuguese, and Arabic. Each corpus consisted of tweets authored by a number of Twitter users labeled with their gender and the specific variant of their language which was used in the documents (e.g. Brazilian or European Port...
متن کاملUsing Character n-grams and Style Features for Gender and Language Variety Classification
Author profiling is the problem of determining the characteristics of an author of an anonymous text. In this paper, we detail a method to determine the language variety and the gender of the authors of tweets, as a submission for the Author Profiling Task at PAN 2017. This method seeks to select the most significant character n-grams for each class considered, combining them with style feature...
متن کاملTopic Models and n-gram Language Models for Author Profiling - Notebook for PAN at CLEF 2015
Author profiling is the task of determining the attributes for a set of authors. This paper presents the design, approach, and results of our submission to the PAN 2015 Author Profiling Shared Task. Four corpora, each in a different language, were provided. Each corpus consisted of collections of tweets for a number of Twitter users whose gender, age and personality scores are know. The task wa...
متن کاملTweets Classification using Corpus Dependent Tags, Character and POS N-grams
This paper is part of the Author Profiling task at PAN 2015 contest; in witch participants had to predict the gender, age and personality traits of Twitter users in four different languages (Spanish, English, Italian and Dutch). Our approach takes into account stylistic features represented by character Ngrams and POS N-grams to classify tweets. The main idea of using character Ngrams is to ext...
متن کاملOpinion Analysis for Twitter and Arabic Tweets: a Systematic Literature Review Mnahel
The objective of this paper is to present the current evidence relative to twitter opinion mining in general and also the current state of Arabic tweets’ opinion mining. The researcher performed a systematic literature review (SLR) to investigate features and methods used for twitter opinion mining and if those features and methods have been used for Arabic tweets opinion mining. Sixty five pap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017